Skip to content

buckets: document CopyFile operation for storage buckets#2375

Merged
mishig25 merged 5 commits intomainfrom
docs/bucket-copyfile-operation
Apr 13, 2026
Merged

buckets: document CopyFile operation for storage buckets#2375
mishig25 merged 5 commits intomainfrom
docs/bucket-copyfile-operation

Conversation

@mishig25
Copy link
Copy Markdown
Collaborator

@mishig25 mishig25 commented Apr 12, 2026

Summary

  • Document the copy files operation for storage buckets
  • Show CLI (hf buckets cp) and Python (api.copy_files) usage
  • Note that only Xet-tracked files can be copied server-side
  • Update the "Data processing pipelines" use case to reference the new copy capability

🤖 Generated with Claude Code


Note

Low Risk
Low risk documentation-only change that adds guidance for hf buckets cp/HfApi.copy_files; no runtime or API behavior is modified.

Overview
Adds a new “Copying files between repos and buckets” section to the storage buckets docs, describing server-side copying of Xet-tracked content from Hub repos or other buckets into a destination bucket.

Includes CLI (hf buckets cp) and Python (HfApi.copy_files) examples, plus a note that only Xet-tracked files copy server-to-server (non-Xet files are downloaded and re-uploaded) and that source read + destination write access is required.

Reviewed by Cursor Bugbot for commit 0a40880. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@mishig25 mishig25 changed the title docs: document CopyFile operation for storage buckets buckets: document CopyFile operation for storage buckets Apr 12, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mishig25 mishig25 force-pushed the docs/bucket-copyfile-operation branch from 649ff55 to 1e0ec73 Compare April 12, 2026 20:20
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
@mishig25 mishig25 requested a review from Wauplin April 13, 2026 08:32

Buckets serve as staging areas for data processing workflows. Process raw data, write intermediate outputs to a bucket, then promote the final artifact to a versioned [Dataset](./datasets) repository when the pipeline completes. This keeps your versioned repo clean while giving your pipeline fast mutable storage.

Note that transferring data from a Bucket to a repository without reuploading is not yet available, but is on the roadmap.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that comment is still valid, @mishig25

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(reminder to please review your agentic PRs before setting them as ready to review)

Copy link
Copy Markdown
Collaborator Author

@mishig25 mishig25 Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that comment is still valid, @mishig25

handled in #2375 (comment)

(reminder to please review your agentic PRs before setting them as ready to review)

I did review all the PRs I submitted: this PR, #2376, #2377 (as proof, you will see force-pushes and/or followup commits I made after the initial PR, and hand-edited PR descriptions in all three). And only making them as ready after my review so that the implementors of the features can catch inaccuracies such as above I didn't catch on my review/understanding of the feature

Copy link
Copy Markdown
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @davanstrien too

Copy link
Copy Markdown
Member

@davanstrien davanstrien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few small nit suggestions

Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping :)

@mishig25 mishig25 merged commit ccc7b66 into main Apr 13, 2026
3 checks passed
@mishig25 mishig25 deleted the docs/bucket-copyfile-operation branch April 13, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants